Enhancing Memory Efficiency in Amazon ElastiCache for Redis and Amazon MemoryDB for Redis

Amazon MemoryDB for Redis and Amazon ElastiCache for Redis serve as powerful in-memory data stores. While ElastiCache functions primarily as a caching solution, MemoryDB is tailored for durable database needs, catering to applications with high-performance demands. As the volume of data accessed and stored continues to surge, optimizing the memory utilization becomes crucial. In this article, I will share several strategies, complete with code snippets, aimed at minimizing your application’s memory footprint when utilizing MemoryDB and ElastiCache for Redis. This optimization not only helps in reducing costs but also enables your instances to accommodate more data within your existing cluster.

Before diving into optimization techniques, it’s important to note that ElastiCache for Redis supports data tiering. This feature automatically distributes data between memory and local high-performance solid-state drives (SSD), making it ideal for applications that frequently access only up to 20% of their datasets. With this capability, ElastiCache for Redis allows you to scale your clusters affordably, handling up to a petabyte of data while potentially achieving over 60% savings per GB of capacity, all with minimal performance impact for workloads that regularly access a small subset of their data. Additionally, ElastiCache for Redis supports auto-scaling, which automatically adjusts your cluster horizontally by adding or removing shards or replica nodes.

Prerequisites

To follow along with this guide, you will need:

An AWS account (the AWS Free Tier is suitable)
An ElastiCache for Redis or MemoryDB cluster (a single instance will suffice)
Access to your local machine or a remote environment like AWS Cloud9 with connectivity to your cluster
The redis-cli client to connect remotely to your instance
Python version 3.5 or newer, along with the following libraries:

pip install redis-py-cluster  # to connect to your ElastiCache or MemoryDB cluster
pip install faker              # to simulate various types of data
pip install msgpack            # to serialize complex data in binary format
pip install lz4 pyzstd         # to compress data

To check the memory utilization, you can use the redis-cli command:

redis-cli -h <your_instance_endpoint> --tls -p 6379
>> memory usage "my_key"
(integer) 153

To connect to your Redis cluster using Python, utilize the redis-py-cluster. Here’s a simple connectivity check:

from rediscluster import RedisCluster
HOST = "<Your host URL>"
redis_db = RedisCluster(host=HOST, port=6379, ssl=True)

When executing multiple operations, consider using pipelines to batch commands and reduce network trips, like so:

pipe = redis_db.pipeline()
pipe.set(key_1, value_1)
...
pipe.set(key_n, value_n)
pipe.execute()

To evaluate the size of an item prior to inserting it into Redis, you can use:

import sys
x = 2  # x can be any Python object
sys.getsizeof(x)  # returns the size in bytes of the object x

For more realistic data simulation, the Faker library is useful:

from faker import Faker
fake = Faker()
fake.name()
'Lucy Cechtelar'
fake.address()
'426 Jordy Lodge'

Basic Optimizations

Before delving into advanced techniques, it’s beneficial to implement basic optimizations, which are straightforward. For our example, we consider a lengthy list of key-value pairs, where the keys represent the IP addresses of hypothetical visitors to our site, and the values encapsulate visit counts, names, and recent actions:

IP:123.82.92.12 → {“visits”:”1″, “name”:”John Doe”, “recent_actions”: “visit,checkout,purchase”},
IP:3.30.7.124 → {“visits”:”12″, “name”:”James Smith”, “recent_actions”: “purchase,refund”},
IP:121.66.3.5 → {“visits”:”5″, “name”:”Peter Parker”, “recent_actions”: “visit,visit”}

You can insert these programmatically using:

redis_db.hset("IP:123.82.92.12", {"visits":"1", "name":"John Doe", "recent_actions": "visit,checkout,purchase"})
redis_db.hset("IP:3.30.7.124", {"visits":"12", "name":"James Smith", "recent_actions": "purchase,refund"})
redis_db.hset("IP:121.66.3.5", {"visits":"5", "name":"Peter Parker", "recent_actions": "visit,visit"})

Reduce Field Names

Since Redis field names consume memory each time they’re utilized, one can save space by adopting shorter names. For instance, instead of using “visits,” you might opt for “v,” “name” can become “n,” and “recent_actions” can be simplified to “r.” The key name itself can also be shortened to “i,” resulting in:

i:123.82.92.12 → {“v”:”1″, “n”:”John Doe”, “r”: “vcp”},
i:3.30.7.124 → {“v”:”12″, “n”:”James Smith”, “r”: “pr”},
i:121.66.3.5 → {“v”:”5″, “n”:”Peter Parker”, “r”: “vv”}

This optimization leads to a memory savings of 23% in our example.

Use Position to Indicate Data Types

If all fields are present, consider using a list instead of a hash. The position of elements in the list will indicate the data type, allowing you to eliminate field names altogether:

i:123.82.92.12 → [1, “John Doe”, “vcp”],
i:3.30.7.124 → [12, “James Smith”, “pr”],
i:121.66.3.5 → [5, “Peter Parker”, “vv”]

This adjustment can yield an additional 14% in memory savings.

Serialize Complex Types

Various serialization methods can efficiently store complex objects. Many programming languages offer their serialization libraries (like pickle in Python), while some libraries, such as ProtoBuf or MsgPack, are cross-language and often more space-efficient. Here’s a MsgPack example:

import msgpack

def compress(data: object) -> bytes:
    return msgpack.packb(data, use_bin_type=True)

def write(key, value):
    key_bytes = b'i:' + compress(key)  # serializing the key as well
    value_bytes = compress(value)
    redis_db.set(key_bytes, value_bytes)

write([121,66,3,5], [134,"John Doe","vcp"])

In this instance, the original object size was 73 bytes, while the serialized object reduced to 49 bytes, achieving a 33% space reduction. To retrieve the value, MsgPack simplifies this process:

def decompress(data: bytes) -> object:
    return msgpack.unpackb(data, raw=False)

def read(key):
    value_bytes = redis_db.get(key)
    return decompress(value_bytes)

# Now we can recover the value object
value = read([121,66,3,5])

Redis-Specific Optimizations

To maintain functionalities such as rapid access and TTL, Redis may require additional memory overhead beyond the data itself. The next sections will explore strategies to minimize this overhead and introduce probabilistic structures to further conserve memory.

Switching from strings or lists to hashes is advisable in many cases. For more insights on this topic, visit Career Contessa to enhance your understanding of soft skills in the workplace. It’s crucial to stay informed about workplace changes; for instance, check out the SHRM for legal compliance regarding telework requests due to health risks. Additionally, you can find helpful resources like this YouTube video to support your learning journey.